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Abstract 


Algorithms  designed  for  VLSI  implementation  are  usually  parallel  and  two-dimensional  in  the  sense  that 
many  processing  elements  laid  out  on  a  silicon  surface  can  operate  simultaneously.  These  algorithms  have 
been  typically  described  by  graphs  or  networks  where  nodes  represent  processing  elements  or  registers  and 
edges  represent  wires.  Although  for  many  purposes  these  traditional  representations  are  adequate  for  specify¬ 
ing  VLSI  algorithms,  they  are  not  suited  for  manuipulating  algorithm  designs.  In  this  paper  an  algebraic 
representation,  together  with  a  semantics,  is  proposed  for  VLSI  algorithm  designs.  By  algebraic  transCbr- 
mations  analogous  to  some  typically  used  in  linear  algebra,  alternative  but  equivalent  designs  satisfying 
desirable  properties  such  as  locality  and  regularity  in  data  communication  can  be  derived.  This  paper 
describes  this  powerful  algebra  for  manipulating  designs,  and  provides  a  mathematical  foundation  for  the 
algebraic  transformations.  The  algebraic  framework  is  more  suitable  for  supporting  formal  manipulation  on 
designs  than  the  network  or  graph-theoretic  models,  especially  for  complex  designs.  As  an  application  of  the 
proposed  algebra,  the  paper  demonstrates  its  use  in  the  design  and  verification  of  systolic  algorithms. 
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1.  Introduction 

Over  the  past  several  years,  many  systolic  algorithms  have  been  proposed  as  solutions  to  computation- 
bound  problems  (see.  e.g.,  [6, 10, 12, 14]).  By  exploiting  the  regularity  and  parallelism  inherent  to  given 
problems  and  by  employing  high  degrees  of  parallelism  and  pipelining,  systolic  algorithms  implemented  in 
VLSI  achieve  high  performance  with  regular  communication  structures  and  low  I/O  requirements  (see  [121 
for  detailed  discussions  of  advantages  of  systolic  structures).  A  number  of  prototype  machines  for  implement¬ 
ing  systolic  algorithms,  ranging  from  single-purpose  chips  [5, 9,  IS],  through  application-oriented  yet 
programmable  systems  [2, 23],  to  very  general  systems  with  reconfigurable  interconnections  [3, 19, 20],  have 
been  designed  and  built  More  recently,  building-block  chips  for  systolic  architectures  have  also  been 
proposed  or  designed  [8, 1, 18, 22],  including  the  CMU  programmable  systolic  chip  (PSC)  [7, 8].  The  general 
question  of  automatically  deriving  systolic  arrays  and  verifying  their  correemess,  however,  remains  open, 
although  several  significant  attempts  have  been  made  in  this  direction  (see,  e.g.,  [4, 16, 17. 21]).  Instead  of 
suggesting  methods  for  deriving  or  verifying  systolic  designs,  we  provide  in  this  paper  an  algebra  for 
manipulating  VLSI  algorithm  designs  in  general.  With  diis  algebra  a  designer  is  able  to  manipulate  designs 
by  “pushing  symbols,"  in  order  to  conveniently  meet  desirable  design  criteria  such  as  locality  and  regularity  of 
data  communication. 

Section  2  illustrates  the  notation  and  basic  principles  by  considering  the  hardware  implementation  of  a 
finite  impulse  re^nse  (FIR)  filter.  Two  representations  are  proposed  to  specify  a  design  with  the  property 
that  from  either  representation  we  can  derive  the  other.  Tlie  z-graph  representation  is  close  to  a  hardware  or 
VLSI  qiecifkation  of  a  design,  and  the  algebraic  representation  is  convenient  for  performing  algebraic  trans¬ 
formations  on  a  design.  Starting  with  a  design  that  corresponds  directly  to  the  mathematical  definition  of  the 
filtering  problem  (and  thus  its  correctness  is  obvious),  we  perform  a  set  of  algebraic  transformations  on  its 
algebraic  representation  and  obtain  the  algebraic  representation  of  a  systolic  design,  from  which  a  systolic 
filtering  array  can  be  derived  automatically.  Section  3,  the  heart  of  this  paper,  provides  a  mathematical 
foundation  for  the  algebraic  transfotmations  used  in  Section  1  These  transformations  arc  formally  justified 
with  req>ect  to  a  proposed  semantics  for  design.  Once  Justified,  they  become  "legal”  transformations  that  can 
be  applied  freely  to  any  design  without  impairing  correctness.  Section  4  presents  another  application  of  the 
algebra,  namely,  the  derivation  of  a  systolic  infinite  impulse  response  (HR)  filtering  array.  The  last  section 
contains  some  concluding  remarks. 
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2.  Basic  Principles  and  Notation  — illustrated  by  a 
FIR  Filtering  Example 

To  aiustrate  the  basic  idea  and  notation  of  this  paper,  this  section  considers  a  concrete  example— the  FIR 
filtering  problem.  We  will  use  many  diagrams  to  make  the  presentation  as  clear  as  possible,  although 
algebraic  transformations  of  this  paper  rely  only  on  the  algebraic  representation.  We  will  perform  algebraic 
nansformations  formally  here  and  postpone  their  justification  to  Section  3. 

2.1.  FIR  Filtering  and  z-Notation 

Consider  the  following  FIR  filter  with  weights 

y,=  H\ar,+  +  W3JC,+2+ (2.1) 

Figure  2*1  depicts  a  straightforward  design;  called  design  S.  for  the  hardware  implcmenudon  of  the  filter.  In 
the  diagram,  each  ®  and  0  represent  a  multiplier  and  adder,  respectively  and  each  Q  or  m  represents  a 
register  capable  of  latching  incoming  data  for  one  cycle  time.  Note  that  the  cycle  time  must  be  long  enough  to 
allow  data  flow  from  re^ster  to  register,  possibly  performing  some  computations  in  between.  One  of  the 
objectives  of  systolic  designs  is  to  minimize  the  cycle  time  by  avoiding  long  communications  and  large 
numbers  of  computations  done  inside  each  cycle,  and  thus  maximize  the  throughput  of  the  resulting  system. 


Figure  2*1.  Design  S  (straightforward  design). 


Figure  2*2  describes  design  S  (ignoring  the  input  and  output  registers  Q  )  with  the  usual  z-notation,  where  a 
delay  of  A  cycles  is  indicated  by  z'K  We  sec  that  in  the  z*noution  the  minimum  cycle  time  is  the  time  to 
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perform  all  the  operations  connected  by  edges  with  label  z'**.  Thus  for  design  S  the  cycle  time  is  at  least  the 
time  to  perform  one  multiplication  (assuming  that  four  hardware  multipliers  are  available)  and  one  4-input 
addition.  In  the  next  section  we  show  a  systolic  design  for  which  only  one  multiplication  and  one  2-input 
addition  will  have  to  be  done  in  each  cycle. 


Figure  2-2.  Design  S  in  the  z-noution. 

2.2.  Systolic  FIR  Filtering  and  z-Graph  Representation 

Figure  2-3  depicts  a  typical  systolic  design  for  FIR  filtering,  called  design  W2  in  [12].  In  this  design  the  Wf 
stay  and  Xj  and  yj  both  move  systolically  from  left  to  right,  but  the  x,-  move  twice  as  slowly  as  the  y^ 


(a) 


(b) 


yout 

X 

Xout 


yu  +  w.  Xf„  ; 
•Tin  ; 

X 


Figure  2-3.  Design  W2:  systolic  FIR  filtering  array  (a)  and  cell  (b). 

Note  that  each  x  value  passes  from  cell  to- cell  without  changing.  Figure  2-4  depicts  the  systolic  array  in  the 
z-notaticHi. 
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Figurc  2-4.  Design  W2  in  the  z-notation 


Figure  2-5.  Design  W2  in  the  z-graph  representation. 


By  grouping  every  pair  of  multiplication  and  addition  as  one  node  to  be  executed  by  a  separate  processor, 
we  derive  the  z-graph  representation  of  the  design  (Figure  2-5).  The  z-graph  representation  of  a  systolic 
design  has  the  "systolic  property”  that  the  input  (the  x  in  Figure  2-5)  is  distributed  to  all  the  nodes  (vi,vj,vj,v«) 
at  different  time  instants  and  edges  between  nodes  have  labels  z~*  with  >  1.  One  of  objectives  of  this  paper 
is  to  introduce  an  algebra  for  deriving  designs  whose  z-graph  representations  will  have  the  systolic  property 
(see  Section  4  below  for  precise  conditions  for  a  systolic  design).  Given  a  design  like  Figure  2-5,  whose 
z-graph  represent  enjoys  the  systolic  property,  a  corresponding  systolic  array  design  is  readily  obuined  by 
simply  passing  the  input  x  through  the  nodes  with  appropriate  delays  as  depicted  in  Figure  2-6.  It  is 
instructive  to  examine  the  correspondence  between  Figure  2-3  and  2-6. 


Figure  2-6.  Design  W2  in  the  graph  representation. 
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2.3.  Algebraic  Representation  of  Design 

In  this  and  next  3ections  we  show  that  the  systolic  design  W2  of  the  preceding  section  can  be  derived 
systematically  by  algebraic  transformation  analogous  to  some  typically  used  in  linear  algebra.  Our  starting 
design  is  design  C  of  Figure  2-7,  which  is  a  variant  of  the  straightforward  design,  design  S,  of  Figure  2-2.  In 
design  C  the  summation  is  distributed  over  a  cascade  of  four  2-input  adders  as  shown  in  Figure  2-7.  Figure 
2-8  describes  design  C  in  the  z-graph  representation. 


X 


0 

Figure  2-7.  IDesign  C—  a  variant  of  design  S  of  Figure  2-2. 


-I 


-“Z 


,-3 


Figure  2-8.  Design  C  in  the  z-graph  represenudon. 


Design  C  relies  on  the  fact  that  in  the  filter  computation  (2.1)  there  are  as  many  multiplications  as  additions. 
Similar  designs  apply  to  many  other  inner-product-like  computations  of  this  kind.  Note  that  in  design  C  of 
Figure  2-8  the  edges  linking  nodes  Vi.v2,V]  and  all  have  labels  z~^  and  therefore  the  cycle  time  must  be  long 
enough  to  perform  computations  associated  with  all  the  nodes  in  sequence.  Thus  design  C  is  not  systolic. 
Assuming  that  design  C  in  the  z-graph  represenudon  (Figure  2-8)  is  given,  our  task  is  to  transform  it  to  the 
systolic  design,  design  W2.  of  Figure  2-5  by  linear  algebra  techniques.  To  this  end,  we  formally  associate  the 
z-graph  represenudon  of  design  C  of  Figure  2-8  with  an  algebraic  represenudon  shown  in  Figure  2-9.  To  see 
the  correspondence  between  the  (wo  represenudons,  consider  for  exaniple  that 


vj  ♦-  r“®Vj  +  z~^x, 


(2.2) 


and 
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Figure  2-9.  IDesign  C  in  the  algebraic  representation. 

Consistently  with  Figure  2-8,  (2.2)  states  that  at  any  time  t,  the  value  of  node  Vj.  v^ii),  depends  on  the  values 
of  node  Vj  at  time  /,  v^(t),  and  the  value  of  input  x  at  time  /-2,  x{i-2),  and  (2.3)  states  that  the  value  of 
output  y  is  the  same  as  the  value  of  node  v,  at  any  time.  More  precisely, 

V:(0=/2lv3(0.Jc(/-2)l.  (2.4) 

where  is  a  2-variable  function  associated  with  such  that 
fi{a,b]= a+ w^b. 

This  defines  one-to-one  correspondence  between  the  z-graph  representation  of  a  design  and  its  algebraic 
representation,  in  the  sense  that  from  either  representation  one  can  derive  the  other.  Note  that  the  plus  sign 
in  (2.2)  represents  some  combination  of  information  by  (2.4)  rather  than  the  usual  arithmetic  addition.  In 
Section  3.2  below  semantics  for  algebraic  expressions  involving  the  symbol  such  as  (2.2)  will  be  given. 

It  is  readily  seen  from  Figure  2-5  that  the  algebraic  representation  of  design  W2  is  that  shown  in  Figure 
2-10. 
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Figure  2-10.  Design  W2  in  the  algebraic  representation. 


2.4.  Deriving  Systolic  Designs  by  Algebraic  Transformations 

In  this  section  we  demonstrate  that  the  algebraic  represenution  of  the  systolic  design,  design  W2,  c;in  be 
obtained  from  that  of  design  C  through  formal  algebraic  transformations;  in  the  next  section  we  will  provide  a 
mathemaucal  foundation  for  these  transformations.  To  simplify  notation,  we  denote  the  algebraic  represen¬ 
tation  of  design  C  by 

v*-Av+bx,  (2.5) 

y=  c^v,  (2.6) 

where  matrbe  A  and  vectors  b,  c  are  defined  according  to  Figure  2-9.  Consider  the  diagonal  matrix 


^-3  0  0  0 

0  0  0 

0  0  z"‘  0 

0  0  0  z' 


and  its  “formal"  inverse 
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0  0  0 

0  0  z*  0 

0  0  0  2® 


Let 


u=Dv. 


Then 


v=  D~^u. 


Multiplying  (2.5)  by  D,  we  have 

Dv*—  DAv+  Dbx. 


By  (2.7)  and  (2.8),  (2.9)  and  (2.6)  become 

u^(DAD-%+iDb)x, 

and 

y={c^D~')u, 


respectively.  Through  formal  calculation,  one  can  ch%k  that 


(2.7) 

(2.8) 

(2.9) 

(2.10) 

(2.11) 


DAO 


0 

0 
0 
L  0 


06  = 


-6 


-4 


,"2 


and 


r  -1  r 

c  D  =|_ 


z®  0 


0  0 


Thus  (2.10)  and  (2.11)  are  the  algebraic  representation  of  the  design  whose  z-graph  rcprescnution  is  shown  in 
Figure  2-11. 

We  have  transformed  design  C  of  Figure  2-8  to  the  design  of  Figure  2-11.  After  renaming  the  value  of 
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Fiture  2-11.  Design  correqxinding  to  (2.10)  and  (2.11)  in  the  z-graph  representation. 


output  y  at  time  i  to  be  that  of  output  y  at  time  t-h  3,  die  design  becomes  exaedy  the  systolic  design  W2  of 
Figure  2-S.  In  conclusion,  we  have  derived  a  systolic  design  by  applying  a  transformation  D  to  the  algebraic 
representation  of  a  non*systolic  design. 
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3.  Foundation  for  Algebraic  Transformations 

In  Section  2.4  we  illustrated  that  a  systolic  design  could  be  derived  by  foimal  algebraic  manipulations 
similar  to  those  used  in  linear  algebra.  This  section  provides  a  mathematical  foundation  for  these  formal 
manipulations.  To  do  so,  we  first  need  to  give  a  semantics  for  VLSI  algorithm  design. 

3.1.  Semantics  of  Design 

We  define  the  semantics  of  a  design  to  be  a  function  of  time  that  the  design  implements.  More  precisely, 
the  semantics  of  some  basic  design  constructs  given  in  either  the  z*graph  representation  or  the  algebraic 
representation  are  summarized  in  the  table  of  Figure  3-1  with  the  following  comments: 


SYNTAX  SEMANTICS 

z-grapti  reprtfHtMtion  algabraic  rtprtsantttlon 


Figure  >1.  SemantKS  of  basic  design  constructs. 

1.  Each  node  v/  in  the  z-graph  representation  or  etch  variable  v/  in  the  algebraic  representation  is  a 
function  of  time  defined  in  terms  of  some  implicit  Junction  fj  associated  with  Vf. 

2.  The  value  of  node  or  variable  v,  at  time  /,  VjJt),  is/\vjit—  Vj(/-  ^)),  where  Vjfr-  /)  is  the 

value  of  Vi  at  time  /-  i,  x(t-j)  is  the  value  of  input  x  at  time  t- /  and  rj(/-  k)  is  the  value  of  v,  at 
time  t-  k. 

3.  The  value  of  output at  time  /  is  the  same  as  the  value  of  v,-  at  time  t- k.  (If  ^=0,  symbol  can 
be  omitted  from  the  z-graph  representation  as  Figures  2-S  and  2-8.) 


L 


SECTIONS 


FOUNDATION  FOR  ALGEBR.\IC TRANSFORMATIONS  -U- 


Note  that  for  designs  of  Figures  2-8  and  2-11.  implicit  function  /=1,2,3.  associated  with  node  v/or  u/ 
with  weight  w,,  is  defined  by 

and  implicit  function^  associated  with  node  or  is  defined  by 
;;idl=w«6. 

where  o  and  6  are  the  left  and  top  inputs  to  the  node,  respectively.  Note  that  implicit  functions  ^  are 
functions  independent  of  time.  As  far  as  the  algebraic  transformations  of  this  paper  are  concerned,  the 
semantics  of  implicit  functions  need  not  be  specified,  as  they  are  invariant  under  these  transformations.  This 
is  the  reason  why  we  call  them  implicit  functions. 

3.2.  Canonical  Algebraic  Representation 

As  shown  in  Figures  2-5  and  2-8,  a  general  design  in  the  z-graph  representation  has  input  x,  output  y  and 
nodes  vj.  •  •  • .  By  grouping  multiple  expressions  for  defining  individual  functions  v^,  •  •  • ,  v„  into  a  single 
matrix  expression,  the  algebraic  representation  of  a  general  VLSI  algorithm  design  often  has  the  form: 

v*-Avhbx.  (3.1) 

y=<^v.  (3.2) 

where  A=(z~“l/)  is  an  nx/i  matrix,  b=(z~*i,  •  •  • .  y=(v,,  •  •  • ,  v,)^,  and  •  •  • ,  z"'")  with  only 

one  nonzero  entry.  This  canonical  form  of  algebraic  representation  has  been  illustrated  by  Figures  2-9  and 
2-10,  and  wiU  be  assumed  in  the  rest  of  the  paper  except  the  concluding  remarks  section. 

3.3.  Well-Defined  Design  and  Equivalent  Designs 

For  /=  1,  •••,«,  the  i-th  component  of  (3.1)  is 

(3J) 

That  is,  (3.1)  is  a  collection  of  expressions  (3.3)  for  i=l,  •  •  • ,«.  For  defining  the  semantics  of  design  (3.1)  and 
(3.2),  (3  J)  means  that  function  y^  satisfies 

v/(0=yilyi(t“aii).»i(t-ao).  •••.v«(/-a/„).x(t-b;)l  (3.4) 

for  some  implicit  function  f  associated  with  node  y^  and  (3.2)  means  that 

y(l)=vjU-Cjl, 

where  -cy  is  the  exponent  of  the  only  nonzero  entry  in  vector  c^.  (Mechanically,  we  can  think  that  in  the 
transformation  from  (3.3)  to  (3.4)  is  replaced  with  “= f.")  Here  we  use  the  convention  that  a  zero  entry 
of  A,  b  or  is  z"  *  and  it  is  omitted  from  expressions  (3  J)  and  (3.4). 
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We  say  that  a  design  is  well-defined  starting  from  some  if  for  /=  1,  •  •  ■ ,  n  and  t>t^  v^fr)  is  completely 
determined  by  values  in  the  sets  {x(/0:  t'  £  t}  and  {viU')'.  <  /},  i=  1,  •  •  • ,  r,  and  this  property  holds  for  any 
implicit  functions.  In  view  of  (3.4)  a  sufficient  condition  for  design  (3.1)  to  be  well-defined  is  that  a^'s  are  all 
positive.  This  is,  however,  not  a  necessary  condition.  It  is  instructive  to  see  that  design  C  of  Figure  2-9  is 
well-defined  in  spite  of  the  fact  that  for  this  design  <iu=  0^= 034=0.  From  Figure  2-9,  we  have 
nW=>;iv2(/X4/-3)l. 

V2(0=/:Iv,(/).J((t-2)l. 

v,(/)=;5[v4(/).J((t-l)l. 

Therefore 

Vi(/)=>5K[/SKW/)l.x(/-l)l.Jt(i-2)U/-3)l, 
v»(0=/iWK  WDl  a((t- DW/- 2)1 

V4(/)=/4Wt)l. 

We  see  that  for  t=l,  •  •  ■  ,4,  V(0)  is  completely  determined  by  values  in  the  set  {x(tO:  1}  for  any  implicit 
fonctions  fi,  and  thus  design  C  is  well-defined.  It  is  easy  to  prove  that  a  sufficient  and  necessary  condition  for 
a  design  to  be  well-defined  is  that  in  its  z-graph  representation  there  does  not  exist  any  cycle  whose  edges  aD 
have  label  z***.  Verifying  this  condition  for  a  design  can  be  done  in  linear  time.  Hereafter  we  are  only 
interested  in  designs  that  are  well-defined. 

Consider  a  well-defined  design  (3.1),  with  some  implicit  function  associated  with  each  node.  Given  an 
input  fonction  (of  time)  jr  and  iniiial  values  v^(0  for  /  <  by  (3.4)  design  (3.1)  defines  a  unique  vector 
fonedon  (of  dme)  v=(vj,  •  •  • ,  v„)^,*  and  together  with  (3.2),  defines  a  unique  output  function  (of  time)  y.  We 
say  two  output  fiincdons  A  and  B  are  essentially  the  same  if  A(i)= a),  where  a  is  some  constant,  for  aD  / 
greater  than  certain  time  intant 

Definition  3.1:  Two  given  designs  are  equivalent,  if  for  any  initial  values  given  for  one  design, 
there  exist  initial  values  for  the  other  design  such  that  with  the  same  input  function  the  two 
designs  produce  essentially  the  same  output  fonction. 

In  the  following  section  we  will  show  that  design  defined  by  (2.5)  and  (2.6)  and  one  defmed  by  (2.10)  and 
(2.11)  are  equivalent 


In  the  eemaniks  Htentute.  Ainciion  vaich  defined  ii  celled  the  "fbtpoiat  eohiUon"  of  “fixpoim  equation"  (3.1). 


SECTIONS 


FOUNDATION  FOR  ALGEBRAIC  TRANSFORMATIONS  - 13  - 


3.4.  Fundamental  Results 

To  express  our  results  on  algebraic  iransfonnations.  we  need  the  following  definitions.  Let  /)={z“‘^0  ^  a® 
nxn  diagonal  matrix. 

L  Fw  v=(*i.  •  •  • ,  define  flv  to  be  «=(«!.  •  •  • .  «,^^such  that  for  i=  1.  •  •  • ,  n, 

for  all  t  for  which  is  defined.  Thus,  D  can  be  viewed  as  an  operator  that  maps  a  vector 

ftinction  v  to  another  vector  function  Dv. 

1  For  •  •  •  .2“*")^,  define  D6  to  be  e=(z“*i.  •  •  •  where 

ei=di+hi 


for /=1.  •••,«. 

3.  Let  ><=(z"*^  be  an  nxn  matrix.  Define  D.4  to  be  an  nxn  matrix  B=(z“*tO  where 
b^sdi+a^ 

fott,ysL’‘'.n.  Product  is  defined  similarly.  We  can  easily  check  that 
iDA)D-^^D(AD‘^), 

and  thus  w«  can  simply  denote  them  by  DAD“K 
Hei«  we  use  the  conventioa  that 

00  =  d/+  00 

fbr  any  </<.  Thus  zero  entries  of  b  or  A  remain  to  be  zero  entries  in  Db  or  DA,  req)ectively. 

Lcnmn  3.1:  SuppoM  that  v  and  u  are  defined  by  well*defined  designs 

V*-  Av+  bx  (3-5) 

and 

u^(DAD'^)U+(Db)x,  (3.6) 

with  dK'ir  initial  values  satisfying 

iii((i-d)=vi(t)  (3*7) 

for  (<  vThes 

uaDv. 


Praaff  Let  v,  and  M/  be  the  rth  components  of  v  and  m,  restively.  Note  foat 
DAD~^»[x  and  Z)6*(z“4”*i.z"4**»,  Thus,  ^  defined  by 

(3.6)saliiflcs 
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ui/it-  </,+ d„- flft).  Jt(/-  bi-  dit\. 

Replacing  i  with  /-f  <//  in  the  above  equation,  we  have 

tt,(/+c/i)=y5(«i(t+</i-aA).«j(/-f-4-a4).  •  •  ■ ,  (3.8) 

u^l+d„-ai^,x{t-b;fi. 

By  (3.4), 

»i(0=y/tvi(i-aft),  Vi(l-aj,).  •  •  • .  (3.9) 

We  prove  by  induction  on  t  that  for  /=  1,  •  •  • .  n, 

«/(!+</<)= v^r)  (3.10) 


for  t  =  (,.  (i  -i- 1. 1;,  -I-  2.  •  •  • .  By  (3.7),  (3.10)  holds  for  /  <  (,.  Thus, 

ujitt+dj-ay)svj(i^-affi 

for  any  j  for  which  a^  >  0.  Since  designs  (3.S)  and  (3.6)  are  well-defined,  (3.8)  and  (3.9)  imply  that 

ttl(^)+di)=V,(^J, 

that  is.  (3.10)  holds  for  i=  By  induction  (3.10)  holds  for  i=  ^,+ 1,  2,  •  • ,  and  so  on.  O 

The  following  lemma  can  be  proven  by  a  similar  method; 

Lenuna  3.2:  If 

y=c^v  and  u=Dv, 

then 

y=(c^D"*)v. 

Immediately  following  from  Lemmas  3.1  and  3.2,  we  have  the  following  result: 

Theorem  3.1:  Design 

v*-Ay+bx, 

is  equivalent  to  design 

u^{DAD-^)u-\-(Dbyx, 

y=(c^2>“>, 

assuming  that  both  designs  are  well-defined. 
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Thc  above  theorem  is  essentially  the  “retiming  lemma”  of  Lciserson  and  Saxe  [17].  Not  using  the  algebraic 
notation  and  approach  taken  here,  they  had  to  rely  on  a  very  long  (4  pages)  and  rather  unclean  proof. 

In  the  following,  we  introduce  another  transformation  whose  function  is  to  scale  down  the  throughput  of  an 

existing  design.  Consider  a  well-defined  design  M  with  input  function  x  and  output  function  y,  and  another 

design  M'  with  input  function  x*  and  output  function  y'.  We  say  that  design  M'  is  a  k-slowed  desiga  of  M  for 

some  positive  integer  k,  if  the  following  holds  for  some  integer  p: 

for  any  initial  values  for  M,  there  exist  initial  values  for 
such  thatif 

x*{ki-{'p)=^xit) 

for  all  I.  then 

y'(ki+p)=y(t) 
for  all  /  where  yO)  is  defined. 

Therefore  as  fru*  as  the  outside  world  is  concerned,  the  function  of  a  k-slowed  design  is  the  same  as  that  of  the 
original  design,  except  that  input  and  output  are  taken  in  and  out,  respectively,  once  every  k  time  units.  The 
usefulness  of  fc-slowed  designs  in  the  derivation  of  systolic  designs  was  first  pointed  out  in  [17],  and  it  will 
become  clear  in  the  next  two  sections.  The  following  lemma  shows  a  simple  way  to  implement  a  well-defined, 
A;-slowed  design. 

Lemma  3.3:  If 

y*-Av+bx,  (3.11) 

y=c^v 

is  a  well-defined  design,  then  the  design 

v'^  A'v'+b'x'.  (3.12) 

y=c'V,  (3.13) 

with  i<'=(z"*“jO,  b'=(z”**i,  ■  ■  •  r”**»i)r  and  c'^=(z~^i  •  •  •  r"**«)  is  a  well-defined,  ik-slowed 
design. 

Proofi  Since  in  their  z-graph  represenutions  the  two  designs  have  the  same  set  of  edges  with 
label  z~*,  wcll-defineness  of  one  design  implies  that  of  the  other.  Let  vy  and  v(  be  the  f-th 
components  of  v  and  v*,  respectively.  Without  loss  of  generality,  assume  that  the  output  functions 
y  and  /  of  the  two  designs  satisfy 

)</)=!  vy(/-fy).  and 


respectively.  Suppose  that  the  original  design  is  well-defined  starting  from  It  suffices  to  prove 
thatif 


1 
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x'(ki)=xO) 

for  all  I.  and 

for/=l, and  /<^then 


(3.14) 


for  all  t  for  which  >(/)  is  defined.  The  proof  is  ^ilar  to  that  of  Lemma  3.1  and  is  omitted. 
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4.  Determining  Algebraic  T ransformations 

Given  a  well-defined  design,  we  want  to'  determine  a  Ar-slowed  design  and  such  that  design 

u^{DA'D-^)u  +  (Db')3if, 

y' 

will  be  well-defined  and  systolic.  This  imposes  the  following  conditions  on  the  entries  of  DA'D~^  and  Db*: 

Cl.  For  1=1,  •••,«, 

dj+kajj—dj^  1. 

{This  assures  not  only  that  the  design  is  well-defined,  but  also  that  the  cycle  time  only  has  to  be 
long  enough  to  perform  the  computation  of  at  most  one  node.} 

C2.  All  nonzero  entries  of  any  column  of  DA*D“^  and  Db'  must  be  distinct 

{This  assures  that  the  value  of  a  node  at  any  time  never  has  to  be  sent  to  more  than  one  node 
simultaneously,  and  thus  no  broadcasting  or  fanout  of  data  is  needed.} 

It  is  an  easy  exercise  to  show  that  if  the  original  design  is  well-defined,  that  is,  in  its  z-graph  representation 
there  does  not  exist  cycles  whose  edges  all  have  label  z“®.  then  there  exist  k  and  D  for  which  conditions  Cl 
and  C2  are  satisfied.  To  maximize  throughput  we  are  interested  in  a  solution  which  has  the  smallest-possible 
k.  It  turns  out  that  for  some  designs  to  satisfy  Cl  and  C2,  k  must  be  greater  than  one,  as  to  be  illustrated  by 
IIR  filtering  example  in  the  next  section.  This  is  the  reason  why  we  perform  transformations  on  a  ^-slowed 
design,  with  A:>  1,  rather  than  the  original  design. 


i 
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5.  HR  Filtering—  A  Further  Example 

Consider  the  implemenution  of  the  following  infinite  impulse  response  (HR)  filter  with  weights  w,; 

yi=  MU'/-!  +  Hy-i-j  +  WjXj  +  (5.1) 

The  above  equation  states  that  at  any  given  time  i,  the  value  of  output  y  depends  on  the  values  of  .v  at  times 
/- 1  and  /-2,  and  input  x  at  times  t  and  i- 1.  Figure  5-1  depicts  a  straightforward  design  for  the  HR  filter  in 
the  z-notation. 


X 


Figure  5-1.  Straightforward  design  for  the  HR  filter  in  the  z-notation. 

Similar  to  the  FIR  design  of  Figure  2-7,  the  4-input  adder  of  Figure  5-1  can  be  distributed  over  a  cascade  of 
four  2-input  adders.  This  forms  a  design  with  four  identical  nodes,  whose  z-graph  representation  is  depicted 
in  Figure  5-2.  Figure  5-3  describes  the  algebraic  representation  of  the  design. 


X 


Figure  5-2.  HR  filter  in  the  z-graph  representation. 

According  to  Lemma  3.3,  a  ^-slowed  design  can  be  obuined  by  changing  labels  z~*  to  for  any  h.  The 
algebraic  representation  of  the  *-slowcd  HR  filter  is  described  in  Figure  5-4,  and  is  denoted  by 
y^Afy>fb>x, 
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Figure  5*3.  HR  filter  in  the  algebraic  representation. 


Figure  5*4.  /f-slowed  HR  fllter  in  the  algebraic  representation. 
We  seek  a  diagonal  matrix  D  such  that  the  design  described  by 
u^(DA'D-^)u+(Db'):^. 

y=(c'^D‘‘)H. 

will  be  well-defined  and  systolic.  By  condition  Cl  of  Section  4, 
kt\, 
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d,-ck>l, 

and  by  condition  C2, 

d,j^k+d,. 

One  can  check  that  a  solution  with  the  minimum-possible  value  for  k  is  that  k=2  and 


Note  that 


D  = 


,-2  0 


z'*  0 


z-"  0 


DA  D 


z-'z-'  0 


0 

0  0 
LOO 


2“'  0 


-1 


Db 


0 

0 

z-” 

-1 


and 


c’^D 


z2  0  0  0 


Thus  the  resulting  systolic  UR  filtering  array  in  the  z-graph  representation  is  shown  in  Figure  5-5.  This 
systolic  array  was  previously  described  in  [11]. 
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6.  Concluding  Remarks 

We  proposed  two  representations  for  specifying  a  design — the  z-graph  representation  and  the  algebraic 
representation.  From  either  representadon  we  can  derive  the  other.  The  z-graph  representauon  is  readily 
mappable  to  a  hardware  or  VLSI  implementadon,  whereas  the  algebraic  representadon  is  suitable  for  al¬ 
gebraic  transformadons.  For  algebraic  transformadons,  only  algebraic  representadons  of  designs  are  needed. 
By  working  within  an  algebraic  framework,  rather  than  a  network  or  graph-theoreuc  framework,  one  can  use 
powerful  algebraic  operators  to  manipulate  designs  and  can  deal  with  abstracdon  conveniendy.  For  example, 
using  matrix  noudon,  a  simple  algebraic  expression  such  as  (3.1)  can  represent  design  of  arbitrary  size. 


A  more  general  algebraic  representadon  than  the  one  described  in  (3.1)  and  (3.2)  is; 


v«-  Av+  Bx, 

(6.1) 

y=C^v. 

(6.2) 

where  input  x  and  output  y  are  vectors  rather  than  scalars,  and  B  and  C  are  matrices  rather  than  vectors  b  and 
c.  This  general  form  of  representadon  seems  to  cover  ail  the  interesdng  VLSI  algorithm  designs  that  we  know 
of  and  can  andcipate.  For  example,  for  the  design  of  Figure  6- 1(a)  for  muldplying  a  bidiagonal  upper 
triangular  matrix  with  a  bidiagonal  lower  triangular  matrix,  we  have 


0  0  0  0 

1 

o 

o 

o 

O 

-  - 

0  0  0  0 

B  = 

o 

O 

o 

0  0  0  0 

o 

o 

o 

o 

.  2'®  0  0  0  . 

1 

o 

o 

o 

and 


C 


z"®  0  0 

0  0  z“® 

0  0 


Without  loss  of  generality  we  can  always  assume  that  there  is  only  one  nonzero  entry  in  each  row  of  C^.  that 
is,  at  any  time  the  value  of  each  output  yi  is  equal  to  that  of  some  node  at  that  time  or  earlier.  Results  and 
definidons  of  this  paper  can  all  be  extended  in  a  straightforward  way  to  this  general  form  of  the  algebraic 
representadon  (6.1)  and  (6.2).  For  example  wc  can  show  that  starting  with  the  non-systolic  design  of  Figure 
6- 1(a).  a  systolic  solution  with  the  minimum-possible  value  for  k  is  that  /<;=  1  and 
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Z"®  0  0  0 

0  Z"®  0  0 

0  »  • 

0  0  z  ^  0 

0  0  0 

The  resulting  systolic  array  is  illustrated  in  Figure  6-l(b),  which  is  precisely  the  sysulic  design  for  band  matrix 
multiplication  proposed  in  (21].  Detailed  discussions  of  this  and  other  results  including  the  use  of  the 
proposed  algebra  in  the  derivation  of  two-level  pipelined  systolic  arrays  [13]  and  systolic  arrays  for  priority 
queues  and  LU-decomposition  of  mauices  will  appear  in  forthcoming  papers. 


(a)  (b) 

Figure  6-1.  Designs  for  band  matrix  multiplication  in  the  z-graph 
representation;  (a)  a  non-systolic  design,  and  (b)  a  systolic  design. 


We  view  that  major  contributions  of  this  paper  are  at  the  proposed  semantics  for  VLSI  algorithm  design, 
algebraic  representation  and  transformations,  and  the  mathematical  foundation  for  these  transformations. 
With  these  algebraic  tools,  we  are  able  to  manipulate  designs  by  “pushing  symbols"  as  we  do  in  algebra,  and 
to  prove  theorems  about  design  transformations  (e.g..  Theorem  3.1).  without  relying  on  any  drawings.  Deriv¬ 
ing  systolic  design  is  just  of  one  of  many  potential  applications  of  the  proposed  algebra. 
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